First we can look at the concentrations in each zip code and day.
Next we can look at the cases in each zip code and day.
Next we’ll classify each day-zip combination as experiencing some wildfire exposure or not. Wildfire period is defined as any day where the zip code has a PM2.5 value greater than zero (note: some are very low). See how the WF period(s) is/are more clearly the second plot which uses a threshold of 1, instead of 0.
Now we can begin to calculate RR, first for the entire zip codes (not race stratified). We’ll use a constant of o.1 to fill in spots where there are no cases during the exposure or control periods.
constant <- 0.1
rates_overall <- foo %>%
mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
group_by(zip, wildfire) %>%
summarise(cases = sum(total),
days = length(svc_from_dt)) %>%
mutate(casesPerDay = cases/days)
RR_overall <- foo %>%
mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
group_by(zip, wildfire) %>%
summarise(cases = sum(total),
days = length(svc_from_dt)) %>%
mutate(cases = ifelse(cases == 0, constant, cases),
casesPerDay = cases/days) %>%
select(zip, wildfire, casesPerDay) %>%
spread(key = wildfire, value = casesPerDay) %>%
mutate(RateRatio = WFperiod/nonWFperiod,
ln_RateRatio = log(RateRatio)) %>%
rename(WF_rate = WFperiod,
nonWF_rate = nonWFperiod)
DT::datatable({rates_overall}) %>% DT::formatRound(5,5)
DT::datatable({RR_overall}) %>% DT::formatRound(c(2:5),3)
Plot Overall rate ratios
## Calculating race-specific RRNow we can begin to calculate race-specific RR.
RR_race <- foo %>%
group_by(zip, wildfire) %>%
summarise(white = sum(white),
hispanic = sum(hispanic),
black = sum(black),
native_am = sum(native_am),
asian_pi = sum(asian_pi),
other = sum(other),
days = length(svc_from_dt)) %>%
gather(white, hispanic, black, native_am, asian_pi,other, key = race, value = cases) %>%
mutate(cases = ifelse(cases == 0, constant, cases),
casesPerDay = cases/days) %>%
select(zip, wildfire, casesPerDay, race) %>%
spread(key = wildfire, value = casesPerDay) %>%
mutate(RateRatio = WFperiod/nonWFperiod,
ln_RateRatio = log(WFperiod/nonWFperiod))
DT::datatable({RR_race}) %>% DT::formatRound(c(3:6),3)
Look at the distrubutions of race_specific rate ratios, need to look into why so many of these are focused around 1.5…?
Here’s the same data in a BoxPlot
foo %>%
group_by(zip, wildfire) %>%
summarise(white = sum(white),
hispanic = sum(hispanic),
black = sum(black),
native_am = sum(native_am),
asian_pi = sum(asian_pi),
other = sum(other),
days = length(svc_from_dt)) %>%
gather(white, hispanic, black, native_am, asian_pi,other, key = race, value = cases) %>%
mutate(cases = ifelse(cases == 0, constant, cases),
casesPerDay = cases/days) %>%
select(zip, wildfire, casesPerDay, race) %>%
spread(key = wildfire, value = casesPerDay) %>%
mutate(RateRatio = WFperiod/nonWFperiod,
ln_RateRatio = log(WFperiod/nonWFperiod)) %>%
ggplot() + geom_boxplot(aes(x= race, y= ln_RateRatio))+ geom_jitter(aes(x= race, y= ln_RateRatio, color=race), alpha=0.2) + guides(color=FALSE)
Combine with other covariates, calculate the weights and